Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

semantic similarity! [wip] #77

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open

Conversation

DronHazra
Copy link
Collaborator

pr includes:

  • code to compute embeddings (via cohere api)
  • storing said embeddings in a new db collection
  • computing semantic similarity (cosine similarity between embedding vectors) between current updates and all other updates from the author
  • sorting and taking the top 3
  • displaying these updates at the bottom of every update (not on page load, just once everything is computed)

everything works correctly but its all a bit jank:

  • computing the semantic similarities is done on the backend, but typescript is really not the right tool here.

im computing dot products one by one on typed arrays (which are much faster than normal arrays, but still). this could easily be vectorized in numpy (pack embeddings into an embedding matrix, multiply that matrix by the current embedding to get the pairwise similarities) but js doesnt have a good library for it.

i'll probably make a little python thing to compute these, and it'll probably be useful if theres any other data-related stuff we want to do in the future.

  • sorting is awfully inefficient

i should be implementing a top-k sort rather than sorting the entire array of similarities. would be much faster (the algorithm im imagining would be n log k, the dead simple one is just three linear scans of the array), but at the moment the dominant cost is computing the cosine similarities.

  • the design is also a bit jank

not entirely sure what the best ui is for displaying these within updately's design system.. would need some help on this

@vercel
Copy link

vercel bot commented Jan 20, 2023

Someone is attempting to deploy a commit to a Personal Account owned by @wwsalmon on Vercel.

@wwsalmon first needs to authorize it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant